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Abstract 

To calibrate Fourier analysis of S$ ranking data by Markov chain Monte Carlo 
techniques, a set of moves (Markov basis) is needed. We calculate this basis, and 
use it to provide a new statistical analysis of two data sets. The calculation involves 
a large Grobner basis computation (45825 generators), but reduction to a minimal 
basis and reduction by natural symmetries leads to a remarkably small basis (14 
elements). Although the Grobner basis calculation is infeasible for Sq, we exploit the 
symmetry of the problem to calculate a Markov basis for Sq with 7,113,390 elements 
in 58 symmetry classes. We improve a bound on the degree of the generators for a 
Markov basis for S n and conjecture that this ideal is generated in degree 3. 



1 Election data with five candidates 



Table 1 shows the results of an election. A population of 5738 voters was asked 
to rank five candidates for president of a national professional organization. 
The table shows the number of voters choosing each ranking. For example, 29 
voters ranked candidate 5 first, candidate 4 second, . . . , and candidate 1 last, 
resulting in the entry 54321 = 29. Table 2 shows a simple summary of the 
data: the proportion of voters ranking candidate % in position j. For example, 
28.0% of the voters ranked candidate 3 first and 23.1% of the voters ranked 
candidate 3 last. 



Table 2 is a natural summary of the 120 numbers in Table 1, but is it an 
adequate summary? Does it capture all the "juice" in the data? In this paper, 
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Table 1 

American Psychological Association voting data: the number of voters that ranked 



the 5 candidates in a given order. 



Ranking 


■j^. votes 


Ranking 


votes 


Ranking 


votes 


Ranking 


votes 


54321 


29 


43521 


91 


32541 


41 


21543 


36 


54312 


67 


43512 


84 


32514 


64 


21534 


42 


54231 


37 


43251 


30 


32451 


34 


21453 


24 


54213 


24 


43215 


35 


32415 


75 


21435 


26 


54132 


43 


43152 


38 


32154 


82 


21354 


30 


54123 


28 


43125 


35 


32145 


74 


21345 


40 


53421 


57 


42531 


58 


31542 


30 


15432 


40 


53412 


49 


42513 


66 


31524 


34 


15423 


35 


53241 


22 


42351 


24 


31452 


40 


15342 


36 


53214 


22 


42315 


51 


31425 


42 


15324 


17 


53142 


34 


42153 


52 


31254 


30 


15243 


70 


53124 


26 


42135 


40 


31245 


34 


15234 


50 


52431 


54 


41532 


50 


25431 


35 


14532 


52 


52413 


44 


41523 


45 


25413 


34 


14523 


48 


52341 


26 


41352 


31 


25341 


40 


14352 


51 


52314 


24 


41325 


23 


25314 


21 


14325 


24 


52143 


35 


41253 


22 


25143 


106 


14253 


70 


52134 


50 


41235 


16 


25134 


79 


14235 


45 


51432 


50 


35421 


71 


24531 


63 


13542 


35 


51423 


46 


35412 


61 


24513 


53 


13524 


28 


51342 


25 


35241 


41 


24351 


44 


13452 


37 


51324 


19 


35214 


27 


24315 


28 


13425 


35 


51243 


11 


35142 


45 


24153 


162 


13254 


95 


51234 


29 


35124 


36 


24135 


96 


13245 


102 


45321 


31 


34521 


107 


23541 


45 


12543 


34 


45312 


54 


34512 


133 


23514 


52 


12534 


35 


45231 


34 


34251 


62 


23451 


53 


12453 


29 


45213 


24 


34215 


28 


23415 


52 


12435 


27 


45132 


38 


34152 


87 


23154 


186 


12354 


28 


45123 


30 


34125 


35 


23145 


172 


12345 


30 
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Table 2 

First order statistics: The proportion of voters who ranked candidate i in position 
j. This is a scaled version of the Fourier transform of Table 1 at the permutation 
representation. 







Rank 








Candidate 


1 


2 


3 


4 


5 


1 


18.3 


26.4 


22.8 


17.4 


14.8 


2 


13.5 


18.7 


24.6 


24.6 


18.3 


3 


28.0 


16.7 


13.8 


18.2 


23.1 


4 


20.4 


16.9 


18.9 


20.2 


23.3 


5 


19.6 


21.0 


19.6 


19.2 


20.3 



we develop tools to answer such questions using Fourier analysis and algebraic 
techniques. 



In Section 2, we give a general exposition of how noncommutative Fourier 
analysis can be used to analyze group valued data with summary given by 
a representation p. In order to use Markov chain Monte Carlo techniques to 
calibrate the Fourier analysis, we define an exponential family and toric ideal 
associated to a finite group G and integer representation p. A generating set 
of the toric ideal can be used to run a Markov chain to sample from data on 
the group. For example, the 14 moves in Table 3 allow us to randomly sample 
from the space of data on with fixed first order summary (Table 2). 

In Section 3 we show how this basis (Table 3) was computed - either using 
Grobner bases or by utilizing symmetry. We describe extensive computations 
of the basis for ranked data on at most 6 objects. From these computations, 
we conjecture that the toric ideal for S n is generated in degree 3. In Section 4, 
we show this ideal for S„. is g enerated in degree n — 1, improving a result of 
Diaconis and Sturmfelsl ( 19981 ). and we describe the degree 2 moves. Finally, 



in Section 5, we apply these methods to an alyze the data in Table 1 and an 
example from Diaconis and Sturmfelsl ( 1998| ). 



2 Fourier analysis of group valued data 



Let G be a finite group (in our example, G = S5). Let /: G — ► Z be any 
function on G. For example, if gi, g2, ■ ■ ■ , gN is & sample of points chosen from 
a distribution on G, take f(g) to be the number of sample points gi that are 
equal to g. We view / interchangeably as either a function on the group or an 
element of the group ring Z[G]. Recall that a map p: G — > GL(V P ) is a matrix 
representation of G if p(st) = p(s)p(t) for all s,t G G. The dimension d p of 
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Table 3 

S5 moves: there are 29890 moves in 14 symmetry classes of sizes 200-7200 



Move 


AT 1 

IN umber 


Move 


AT 1 

IN umber 


r 53412 


1 


53421 1 




r 45231 


1 


453121 


DUU 


[54321 


\ ~ 


54312 J 


[54312 


I ~ 


54231 J 




"54123" 




_ - 

541o2 






"53412" 




_ - 

5o4zl 


ouuu 




54231 








54123 








. 54312 




54321 






. 54231 




54213 






"45123" 




"45132" 






"45123" 




"45132" 


79nn 




54231 




54213 




53412 




53421 




L 54312 




54321 






. 54231 




54213 






"43512" 




" 43521 " 


3600 




"43512" 




"43521 " 


3600 




54123 




54132 




53241 




53142 




L 54231 




54213 






. 54123 




54213 






"45231 " 




"45312" 


7200 




"45132" 




"45312" 


3600 




52341 




52431 




52341 




52431 




. 53412 




53241 






^53412 




53142 






"34512" 




"34521 " 


600 




"34521" 




"35142" 


600 




45123 




45213 




45213 




43521 




,_ 53241 




53142 






^ 53142 




54213 






"35142" 




" 35241 " 


600 




" 34521 " 




"35142" 


1440 




43521 




43512 




45312 




42513 




54213 




54123 






52143 




54321 





the representation p is the dimension of V p as a C-vector space. We say that 
a p is integer- valued if Pij(g) E Z for all g E G and for all 1 < i, j < d p . We 
denote the set of irreducible representations of G by G. 



An analysis of f(g) may be based on the Fourier transform. The Fourier trans- 
form of / at p is 

f{p) = E f(9)p(9)- (1) 

g&G 

The Fourier transform at all the irreducible representations p E G determines 
/ through the Fourier inversion formula 

f(9) = jki:d P Tr(f( P ) P (g- 1 )), (2) 
which can be rewritten as f(g) = J2 pe c f\v p {9)i where 



/k(3)4E^)/W. (3) 
|Lr| heG 



This decomposition shows the contributions to / from each of the irreducible 
representations of G. For example, if a few of the f\y are large, we ca n analyze 
these compo nents in order to understand the structure of /. See Diaconisl 
(1988, 1989) for background, proofs, and previous literature. 

Example 1 This analysis is most familiar for the cyclic group C n where it 
becomes the discrete Fourier transform 

n—l 1 n— 1 

fU) = E f{k)e- 2 ^ k '\ f(k) = - E /(j)e 2 ^> (4) 
fc=o n j=0 

In (4), if a few of the f(j) are much larger than the rest, then f is well 
understood as approximately a sum of a few periodic components. 
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Table 4 

Squared length (divided by 120) of the projection of the APA data into the 7 isotypic 
subspaces of S5. 





S 5 


5 4,1 


S 3 ' 2 


5.3,1,1 


52,2,1 


52,1,1,1 


51,1,1,1,1 


d 2 


1 


16 


25 


36 


25 


16 


1 


Data 


2286 


298 


459 


78 


27 


7 






Table 5 

Second order summary for the APA data 



Candidate 


Rank 
1,2 


1,3 


1,4 


1,5 


2,3 


2,4 


2,5 


3,4 


3,5 


4,5 


1,2 


-137 


-20 


18 


140 


111 


22 


4 


6 


-97 


-46 


1,3 


476 


-88 


-179 


-209 


-147 


-169 


-160 


107 


128 


241 


1,4 


-189 


51 


113 


24 


-9 


98 


99 


-65 


23 


-146 


1,5 


-150 


57 


47 


45 


43 


49 


56 


-48 


-53 


-48 


2,3 


-42 


84 


19 


-61 


30 


-16 


82 


-76 


-39 


72 


2,4 


157 


-20 


-43 


-25 


-93 


-76 


-56 


8 


38 


112 


2,5 


22 


-44 


7 


15 


-117 


69 


25 


62 


99 


-138 


3,4 


-265 


-7 


72 


199 


39 


140 


85 


19 


-52 


-233 


3,5 


-169 


10 


88 


70 


78 


44 


47 


-51 


-36 


-80 


4,5 


296 


-24 


-142 


-130 


-5 


-163 


-128 


38 


-9 


267 



For the symmetric group S n , the permutation representation assigns permuta- 
tion matrices p(ir) to permutations tt. Thus, if f(n) is the number of rankers 
choosing ir, f(p) is a n x n matrix with entry the number of rankers 

ranking item i in position j (as in Table 2). The irreducible representations of 
5*5 are indexed by the seven partitions of five and are written as S x where A 
is a partition of 5. For our data, (2) gives a decomposition of / into 7 parts. 
Table 4 shows the lengths of the projection of Table 1 onto the seven isotypic 
subspaces of S5. 

The largest contribution to the data occurs from the trivial representation S 5 . 
We call the projection onto S 5 © S 4 ' 1 the first order summary; it was shown in 
Table 2 above. We see that the projection onto S 3,2 is also sizable while the 
rest of the projections are relatively negligible. This suggests a data-analytic 
look at the projection into S 3 ' 2 . Table 5 shows this projection in a natural 
coordinate system. This projection is based on the permutation representation 
of S5 on unordered pairs {i, j}. Table 5 is an embedding of a 25 dimensional 
spa ce into a 100 dim ensional space so that its coordinates are easy to interpret. 
See Diaconisl (1989) for further explanation. 
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The largest number in Table 5 is 476 in the {1, 3}, {1, 2} position corresponding 
to a large positive contribution to ranking candidates one and three in the top 
two positions. There is also a large positive contribution for ranking candidates 
four and five in the top two positions. Since Table 5 gives the projection 
of / onto a subspace orthogonal to S 5 © S* 4,1 , the popularity of individual 
candidates has been subtracted out. We can see the "hate vote" against the 
pair of candidates one and three (and the pair four and five) from the last 
column. Finally, the negative entries for e.g., pairs one and four, one and five, 
three and four, three and five show that voters don't rank these pairs in the 
same way. 



The preceding analysis is from Diaconisl ( 1989j ) which used it to show that 



noncommutative spectral analysis could be a useful adjunct to other statistical 
techniques for data analysis. 

The data is from the American Psychological Association - a polarized group 
of academicians and clinicians who are on very uneasy terms (the organization 
almost split in two just after this election). Candidates one and three are 
in one camp, candidates four and five from the other. Candidate two seems 
disliked by both camps. The winner of the election depends on the method 
of allocating votes. For example, the Hare system or plurality voting would 
elect candidate three. However, other widely used voting methods (Borda's 
sum of ranks or Coomb's elimination system) ele ct candidate one. For details 
and further analysis of the data, see IStern 



To explain the perturbation analysis in Section 5, it is useful to consider a 
simple exponential model for group-valued data. 

Definition 2 Let p be an integer valued representation of a finite group G. 
Then the exponential family of G and p is given by the family of probability 
distributions on G 

P e (g) = Z- 1 e Tr ( Q ^)) (5) 

where the normalizing constant is Z = D g eG e Tr ( 0p ( 9 ^ and O is a nxn matrix 
of parameters to be chosen to fit the data. 

For example let G = S n and p be the usual permutation representation. Then 
if is the zero matrix, Pq is the uniform distribution. If 01,1 is nonzero and 
Gjj is zero otherwise, the model Pq corresponds to item one being ranked first 
with speci al probabili t y the rest ranked ran d omly. Such mod els h ave been 
studie d by ISilverberd (|l984h : IVerduccil (jl982h : iDiaconid (|l989h . See iMarden 



(1995) "or a book-length treatment of models for permutation data. 



From the Darmois-Koopman-Pitman Theorem (e.g.. iDiaconis and Freedman 



1984 Theorem 3.1), we deduce 



Proposition 3 The model (5) has the property that a sufficient statistic for 
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based on data f(ir) is the Fourier transform f(p). Furthermore, (5) is the 
unique model characterized by this property. 



Definition 4 Given a finite group G and an integer valued representation p 
of dimension d p define the toric ideal of G at p as Ig, p = ker(0c jP ) ; where 

G , P : C[x g \geG]^ C[tf | l<i,j < d p ] 

n 

l<i,j<d p 



This ideal is the vanishing ideal of the exponential family from Definition 2. 
It will be our main object of study in Sections 3 and 4. 

As suggested by Fisher ( 1973f ). tests of goodness of fit of the model (5) should 
be based on the conditional distribution of the data / given the sufficient 
statistic f(p). By an elementary calculation, 

Pe(f\f(p))=w- 1 l[ 7 h- V where w = £ R (6) 

g(p)=f(p) 

Observe that the conditional distribution in (6) is free of the unknown param- 
eter 0. 



The original justification for the Fourier decomposition is model free (non- 
parametric). The first order summary in Table 2 is a natural object to look at 
and the second order summary was analyzed because of a sizable projection 
to S* 3 ' 2 in Table 4. It is natural to wonder if the second order summary is real 
or just a consequence of finding patterns in any set of numbers. To be honest, 
the APA data is not a sample (those 5,972 who choose to vote are likely to be 
quite different from the bulk of the 100,000 or so APA members). If the first 
order summary is accepted "as is", the largest probability model for which 
f(p) captures all the structure in the data is the exponential family (5). It 
seems natural to use the conditional distribution of the data given f(p) as a 
way of perturbing things. The uniform distribution on data with fixed f{p) 
is a much more aggressive perturbation procedure. Both are computed and 
compared in Section 5. 



3 Computing Markov bases for permutation data 



To carry out a test based on Fisher's principles, we use Markov chain Monte 
Carlo to draw samples from the distribution (6). 
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Table 6 

Markov bases for S3 and S4 and the size of their symmetry classes. 



S3 Move 



Number 





"123" 




" 132" 






231 




213 






_312 




_321 





5 4 


Move 


Number 


[1234 
L 2143 


]- 


12431 
2134 J 


18 


"2314" 
2431 
4123 

" 1324 " 
2134 
3214 




"2134" 
2413 
.4321 
"1234" 
2314 
3124 


144 
16 



Definition 5 A Markov basis for a finite group G and a representation p is 
a finite subset of "moves" gi, . . . ,gg G 7L\G\ with cji(p) = such that any two 
elements in N[G] with the same Fourier transform at the representation p can 
be connected by a sequence of moves in that subset. 



In Diaconis and SturmfeLsl ( 19981 ) it was explained how Grobner basis tech- 
niques could be applied to find such Markov bases. 

Proposition 6 A generating set of Ig, p (see Definition 4) is a Markov basis 
for the group G and the representation p. 

We will write Ig n for our main example, the ideal of S n with the permutation 
representation p. The representation p: N[S n ] — > N n sends an element of S n to 
its permutation matrix. The elements b G N n with p _1 (b) non-empty are the 
magic squares, that is, matrices with non-negative integer entries such that all 
row and column sum are equal. We write an element 7Ti + • • • + n m G N[S n ] as a 

JTl(l) ... 7Tl(n) 

tableau : : . In this notation, a Markov basis element is written as 

a difference of two tableaux. For example, the degree 2 element of the Markov 
basis for S5, [14325] — [14352]) corresponds to adding one to the entries 13452 
and 14325 in Table 1 and subtracting one from the entries 13425 and 14352. 



At the time of writing Diaconis and SturmfeLsl ( 19981 ). finding a Grobner basis 
for Is 5 was computationally infeasible. Due to an increase in computing powe r 



20031) . 



and the development of the software 4ti2 ([Hemmecke and Hemmeck 
we were able to compute a Grobner and a minimal basis of I$ 5 ■ 



This computation involved finding a Grobner basis of a toric ideal involving 
120 indeterminates. It took 4ti2 approximately 90 hours of CPU time on a 
2GHz machine and produced a basis with 45,825 elements. The Markov basis 
had 29890 elements, 1050 of degree 2 and 28840 of degree 3, see Tables 3 and 
7. Using 4ti2, we have also computed Markov bases of the ideals Is n for n = 3 
and n = 4, they are shown in Table 6. 

Although the calculation for Sq is currently not possible using Grobner basis 
methods, there is a natural group action that reduces the complexity of this 

2 

problem. The group S n x S n acts on by permuting rows and columns. If we 
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permute the rows and columns of a magic square, we still have a magic square, 
therefore, this action lifts to a group action on the Markov basis of Ig n - In 
terms of tableaux, one copy of S n acts by permuting columns of the tableau, 
the other acts by permuting the labels in the tableau. We have calculated 
orbits under this action, notice that the symmetrized bases are remarkably 
small (Table 7). 

To calculate a Markov basis for Ig 6 , we had to construct the fiber over ev- 
ery magic square with sum at most 5 (by Theorem 8) and then pick moves 
such that every fiber is connected by these moves (see ISturmfelsl 1996, The- 
orem 5.3). For degrees 2 and 3 this was relatively straightforward (e.g., there 
are 20,933,840 six by six magic squares with sum 3). For these degrees, we 
constructed all squares and then calculated orbits of the group action and 
calculated the fiber for each orbit (there were 11 orbits in degree 2 and 103 in 
degree 3). 

However, there are 1,047,649,905 six by six magic squar es of degree 4 and 
30,767,936,616 of degree 5 (from Beck and Pixton . 20031 ). so complete enu- 



meration was not possible. Instead, we first randomly generated millions of 
magic squares with sums 4 or 5 using another Markov chain. We broke these 
down into orbits, keeping track of the number of squares we had found. For 
example, we needed to generate 30 million squares of degree 5 to find a repre- 
sentative for each orbit. We were left with 2804 orbits for degree 4 and 65481 
orbits for degree 5. For degree 5, the proof of Theorem 8 shows that we only 
need to consider magic squares with norm squared less that 50, leaving 13196 
orbits to check. The fibers were calculated by a depth first search with prun- 
ing. Remarkably, the computation showed that I Sfi is generated in degree 3, 
see Table 7. 

The entire calculation for Sq took about 2 weeks, with the vast majority of 
the time spent calculating orbits of degree 5 squares. Our data and code (in 
perl) are available for download at http://math.berkeley.edu/~eriksson 
The code could be easily adapted to calculate other Markov bases with a 
good degree bound and a large symmetry group. Our calculations and Table 7 
suggest the following conjecture: 

Conjecture 7 The ideal Ig n is generated in degree 3. 



4 Structure of the toric ideal Is, 



Theorem 6.1 of Diaconis and Sturmfelsl ( 1998| ) shows that every reverse lex- 
icographic Grobner basis of I$ n has degree at most n. By considering only 
minimal generators and not a full Grobner basis, we are able to strengthen 



9 



Table 7 

Number of generators and symmetry classes of generators by degree in a Markov 
basis for 





Dpi? 


~ee 2 


Dpptpp 


3 


De£ 


^ree 4 


De^ 


fxcc 5 


Dej 


£ree 6 


n 


all 


sym 


all 


sym 


all 


sym 


all 


sym 


all 


sym 


3 








1 


1 














4 


18 


1 


160 


2 
















5 


1050 


2 


28840 


12 


















6 


57150 


7 


7056240 


51 





















this degree bound. 

Theorem 8 The ideal Ig n is generated in degree n — 1 for n > 3. 



PROOF. Since we know that I$ n is generated in degree n, we need to show 
that the fibers over all magic squares with sum n are each connected by moves 
of degree n — 1 or less. Let S and T be tableaux in p _1 (b), where b is a magic 
square with sum n. Suppose that the first row of S and the first row of T differ 
in exactly k places. Then we claim that there is a degree k + 1 move that can 
be applied to S to get a tableaux S' G p _1 (b) with the same first row as T. 

To change the first row of 5* to make it agree with the first row of T, we have 
to permute k elements of the first row of S. But to remain in the fiber, this 
means we must also permute (at most) k other rows of S. For example, if the 
first row of S is 123 . . . n and the first row of T is 213 ... n, we would also have 
to pick the row of S with a 2 in the first column and the row with a 1 in the 
second column. Once we have picked the (at mos t) k rows of S that must be 



changed, it follows from Birkhoff's theorem fe.g-. lvan Lint and Wilsonl . 12001 
Theorem 5.5) that we can change these k rows and the first row to make a 
new tableau S' G p _1 (b) that agrees with T in one row. 

We applied a degree k + 1 move and are left with S' and T being connected 
by a degree n — 1 move, so as long as we have k + 1 < n — 1, we are done. 
That is, for every pair (S, T) of tableaux in a degree n fiber, we must show 
that there is a row of S and a row of T that differ in at most n — 2 places. 

Given such a pair (S, T), introduce annxn matrix M where the entries M^- 
are the number of entries that row i of S and row j of T agree. Notice that if 
Mij > 2, we have rows i in S and j in T that differ in at most n — 2 places 
and are done. 

Suppose that row i of S is (vrj(l), . . . , 7Tj(n)). The row sum Y%=i Mtj counts 
the total number of times that VTj(j) appears in column j for each j. This is 
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exactly Ylk=i 7r (^))- Summing over all rows, we see that every entry of b 
gets counted its cardinality number of times. That is, 



E Ma= £ b(.,j) 2 = ||b 



Now since each row of b sums to n, we have that ||b|| 2 > n 2 , with equality 
only if b(i,j) = 1 for all If this ||b|| 2 > n 2 , then one of the My must be 
larger than 1, and we are done. 

/ii... i\ 

Therefore, we only have to consider the fiber over hi = : : I . Elements of 

V i 1 ... i / 

this fiber are tableaux such that every row and every column is a permutation 
of {1, . . . , n} ( "Latin squares" ). Two tableaux are connected by a degree n — 1 
move if they have a row in common. We claim that if n > 3, this graph is 
connected. (Note that for n — 3, there are two components and a degree 3 
move for S3, see Table 7.) 

For fixed v G S n , the set T v of all tableaux in p -1 ^) that have v as a row is 
connected by definition. Form the graph G n where the vertices are elements 
v G S n and there is an edge between A and v if A and v occur in a tableau 
together. Then if this graph is connected, the whole fiber over bi is connected 
by degree n — 1 moves. 

First, we claim that A and v occur together in a tableau if and only if A 
is a derangement with respect to v (i.e., if A and v are disjoint from each 
other). The derangement condition is clearly necessary. Sufficiency follows 
from Birkhoff's theorem: if A is a derangement with respect to v, then the 
square b x — p(A) — p{y) has non-negative entries and row and column sums 
n — 2, therefore, it it the sum of n — 2 permutation matrices. Thus, G n is 
the graph where two permutations are connected by an edge when they are 
disjoint. 

Now note that [1, 2, ... , n— 2, n— 1, n] and [3, 4, ... , n, 1, 2] are connected in G n 
since the second is a cyclic shift of the first. Then, if n > 3, [3, 4, ... , n, 1, 2] and 
[1, 2, . . . , n — 2, n, n — 1] are also connected. Thus [1,2,..., n] and [1,2, ... ,n — 
2,n,n — 1] are connected, so applying transpositions keeps us in the same 
connected component of G n . But S n is generated by transpositions, so G n is 
connected and therefore p -1 ^) is connected by moves of degree n — 1. □ 



Remark 9 From partial computations with CaTS for n = A, it 

appears that every Grobner basis for S4 contains degree 4 elements, while the 
Markov basis for S4 needs only degree 3. Furthermore, our Grobner basis for S5 
contain ed degree 5 elements. Th e refore , it is possible that the degree n Grobner 
basis of Diaconis and Sturmfels\ \l99&) is the Grobner basis of smallest degree. 
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While Is n is difficult to compute, it is easy to classify the degree 2 part of 
the Markov basis. To do so, first assume that all entries of the magic square 
b are either 1 or 0. Then the squares with non-trivial p _1 (b) are those that 
can be put in a block diagonal form with k > 2 blocks and each block of 
size at least 2. Such a magic square has a fiber of size 2 fc_1 , corresponding to 
choosing, for each block, an orientation of the two permutations that sum to 
that block (since the order of the rows in a tableau don't matter, there are 
only k — 1 such choices). Therefore, we nee d 2 k ~ 1 — 1 mov es to make such 



a fiber connected. It is a standard fact (e.g.. IStanlevl . 119971 Chapter 1) that 
the number of partitions of n into k blocks each of size at least 2 (denoted 
Pi{n\ k)) satisfies 

n>o i=i 1 y 

Then let D2(n) be the number of degree 2 moves, up to symmetry, in a Markov 
basis for S n . If a magic square contains a 2, it can be thought of as coming 
from D^in — 1), so putting everything together, we have 



DM =Do(n 



LfJ k 1 



i)+E(2 fc - i -i)[^ 2fc ]n T 

k=2 i=l 1 

where [q-*](J2 a i<f) '■= a j- F° r example, D 2 (9) = 47. 



5 Statistical analysis of the election data 



In order to run a Markov chain fixing f(p) on data /, we use the Markov basis 
{gi, . . . , g B } as calculated above. Then, starting from /, choose i uniformly 
in {1,2,..., B} and choose e = ±1 with probability 1/2. If / + egi > 
(coordinate- wise), the Markov chain moves to / + eg^. Otherwise, the Markov 
chain stays at /. This gives a symmetric connected Markov chain on the 
data sets with a fixed value of f(p). As such, it has a uniform stationary 
distribution. To get a sample from the hypergeometric dist ribu t ion ( 6), the 
Metropolis algorithm or the Gibbs sampler can be used (see Liu . 200ll ). 



Given a symmetrized basis, we can still perform a random walk. Pick, at 
random, an element g of S n x S n . Pick a move from the symmetrized basis 
at random, apply g to it (permuting columns and renaming entries), then use 
the resulting move in the Markov chain. This again gives a symmetric Markov 
chain that converges to the uniform distribution. 

In this section, we apply the Markov basis for S$ to analyze Table 1. The second 
and third rows of Table 8 show the average sum of squares for 100 samples from 
the hypergeometric distribution (6) (row 2) and from the uniform distribution 
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Table 8 

Squared length (divided by 120) of the projection of the APA data into the 7 isotypic 
subspaces of S5. Also, the averages of this projection for 100 random draws for 3 
perturbations. 





S 5 


5-4,1 


S 3 ' 2 


5-3,1,1 


52,2,1 


52,1,1,1 


51,1,1,1,1 


Data 


2286 


298 


459 


78 


27 


7 





Hypergeometric 


2286 


298 


16 


19 


10 


6 





Uniform 


2286 


298 


511 


672 


436 


295 


25 


Bootstrap 


2286 


303 


469 


93 


37 


13 


1 



Fig. 1. Distribution of the length of the projection to S ' with the Metropolis and 
uniform random walks. 
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Uniform 



10 



15 



20 



25 



Length of projection 



I 1 1 1 1 1 1 

200 300 400 500 600 700 800 

Length of projection 



(row 3) with f(p) fixed. Both sets of numbers are based on a Markov chain 
simulation using a symmetrized version of the minimal basis. In each case, 
starting from the original data set, the chain was run 10,000 steps and the 
current function recorded. From here, the chain was run 10,000 further steps, 
and so on until 100 functions were recorded. While the running time of 10,000 
steps is arbitrary, wide variation in the running time did not appreciably 
change the results. 

A histogram of the 100 values of the length of the projection into S* 3 ' 2 under 
each distribution is shown in Figure 1. These show some of variability but 
nothing exceptional. The histograms for the other projections are very similar. 

Consider first the hypergeometric distribution leading to row 2 of Table 8 and 
Figure 1. A natural test of goodness of fit of the model (5) for the APA data 
may be based on the conditional distribution of the squared length of the 
projection of the data into S' 3 ' 2 . From the random walk under the null model, 
this should be about 15 ± 5. For the actual data, this projection is 459. This 
gives a definite reason to reject the null model. Our look at the data projected 
into S* 3 ' 2 and the analysis that emerged in Section 2 confirms this conclusion. 
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Table 9 

Length of projections onto the 5 isotypic subspaces for the S4 data and three tests. 





S 4 


S 3 ' 1 


S 2 ' 2 


S* 2 ' 1 ' 1 


£1,1,1,1 


Data 


462 


381 


268 


49 


4 


Metropolis 


462 


381 


169 


37 


8 


Uniform 


462 


381 


277 


228 


80 


Bootstrap 


462 


381 


269 


56 


7 



In Diaconis and Efron ( 1985i l. the uniform distribution of the data conditional 
on a sufficient statistic was suggested as an antagonistic alternative to the null 
hypothesis when the data strongly rejects a null model. The idea is to help 
quantify if the data is really far from the null, or practically close to the null 
and just rejected because of a s mall deviation but a large sample size (see the 
discussion in the last section of lDiaconis and Efronl . 119851 ) . From Figure 1, we 
see that the actual projected length 459 is roughly typical of a pick from the 
uniform. This affirms the strong rejection of (5) and points to a need to look 
at the structure of the higher order projection on its own terms. 



An appropriate stability analysis was left open in iDiaconis (1989). If the data 
in Table 1 were a sample from a larger population, the sampling variability 
adds noise to the signal. How stable is the analysis above to natural stochastic 
perturbations? One standard approach is shown in the last row of Table 8. 
This is based on a boot-strap perturbation of the data in Table 1. Here, the 
votes of all 5972 rankers are put in a hat and a sample of size 5972 is drawn 
from the hat with replacement to give a new data set. The sum of squares 
decomposition is repeated. This resampling step (from the original population) 
was repeated 100 times. The entries in the last row of Table 8 show the average 
squared length of these projections. We see that they do not vary much from 
the original sum of squares. While not reported here, the boot-strap analogue 
of the second order analysis in Table 8 was quite stable. We conclude that 
sampling variability is not an important issue for this example. 



In Diaconis and Sturmfelsl (|l998h an £4 example was analyzed. However, the 
data was analyzed using only the uniform distribution, which only tells half 
of the story. The analysis under hypergeometric sampling gives an important 
supplement. Briefly, a sample of 2262 German citizens were asked to rank order 
the desira bility of four political goals. T he data and a first order summary 



Diaconis and Sturmfelsl ( 1998f ). The sizes of the projections for the 



appears m 

data and the random walks appear in Table 9. We have noted a typographical 
error in the data, the 2431 entry should be 59. 

The projection of the data into the second order subspace S 2 ' 2 has squared 
length 268. The boot-strap analysis (Line 4 in Table 9) shows this is stable 
under sampling perturbations. The hypergeometric analysis (line 2 of Table 9) 
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suggests that for the specific data, relatively large projections onto the second 
order space are typical, even if the first order model holds. This is quite dif- 
ferent than the previous example. Still, the observed 268 is sufficiently much 
larger than 169 that a look at the second order projection is warranted. The 
uniform analysis points to the actual projection being typical, this again sug- 
gests a serious look at the second order projection. 



As a side remark, the software LattE ( De Loera et al. . 20031 ) can be used 



to count how many data sets have a given first order summary. For our S4 
example, these correspond to lattice points inside a convex polytope with 
6285 vertices in IR 24 . LattE computes (in only 523.12 seconds) that there are 
11606690287805167142987310121 (approximately 10 28 ) elements of N[S 4 ] with 
the same first order summary as our S4 example. 



6 Conclusions 



In this paper, we have given a general methodology for studying group valued 
data where the summary we are interested in is given by a representation of 
the group and analyzed in detail the case of ranked data. This suggests a 
family of interesting toric ideals: to each finite group G and representation p 
we associate a toric ideal (Definition 4). 

For practical purposes, it would be nice to have a general algorithm to analyze 
ranked data with n candidates. We ran Markov chains using just the degree 2 
moves, but they seemed to mix very poorly. However, our computations and 
Conjecture 7 suggest that finding all (or even some) degree 3 moves in addition 
to the degree 2 moves would allow for a good random walk. 
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